Unsupervised Identification of Persian Compound Verbs

نویسندگان

  • Mohammad Sadegh Rasooli
  • Heshaam Faili
  • Behrouz Minaei-Bidgoli
چکیده

One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based Semi-Automatic Extraction of Persian Compound Verbs and their Relations

Nowadays, Wordnet is used in natural language processing as one of the major linguistic resources. Having such a resource for Persian language helps researchers in computational linguistics and natural language processing fields to develop more accurate systems with higher performances. In this research, we propose a model for semi-automatic construction of Persian wordnet of verbs. Compound ve...

متن کامل

Event Types in the Generative Lexicon Implications for Persian Compound Verbs

Compounding is a highly productive phenomenon in Persian and compound verbs have been the subject of several interesting studies in a variety of frameworks. In this paper, I look at a small subset of these verbs and try to capture some of the constraints and generalities involved in the compounding process. The theoretical framework of the paper is the Generative Lexicon (GL) as formalized in P...

متن کامل

Split complex predicates in Persian

Complex predicates, or compound verbs, constitute a major portion of verbal forms in the Persian language. They are normally formed using a noun, adjective, preposition, or prepositional phrase, followed by a light verb. Unlike many other languages that employ such constructions, Persian allows the two components to become separated. This paper will investigate where complex predicates syntacti...

متن کامل

Extending the coverage of a MWE database for Persian CPs exploiting valency alternations

PersPred is a manually elaborated multilingual syntactic and semantic Lexicon for Persian Complex Predicates (CPs), referred to also as “Light Verb Constructions” (LVCs) or “Compound Verbs”. CPs constitutes the regular and the most common way of expressing verbal concepts in Persian, which has only around 200 simplex verbs. CPs can be defined as multi-word sequences formed by a verb and a non-v...

متن کامل

Using Noun Similarity to Adapt an Acceptability Measure for Persian Light Verb Constructions

Light verb constructions (LVCs), such as take a walk and make a decision, are a common subclass of multiword expressions (MWEs), whose distinct syntactic and semantic properties call for a special treatment within a computational system. In particular, LVCs are formed semi-productively: often a semantically-general verb (such as take) combines with a number of semantically-similar nouns to form...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011